由于无人机成本降低并且无人机技术有所改善,无人机检测已成为对象检测的重要任务。但是,当对比度较弱,远距离可见度较弱时,很难检测到遥远的无人机。在这项工作中,我们提出了几个序列分类体系结构,以减少无人机轨道检测到的假阳性比率。此外,我们提出了一个新的无人机与鸟类序列分类数据集,以训练和评估拟议的架构。3D CNN,LSTM和基于变压器的序列分类体系结构已在拟议的数据集上进行了培训,以显示提出的思想的有效性。如实验所示,使用序列信息,鸟类分类和整体F1分数可以分别提高73%和35%。在所有序列分类模型中,基于R(2+1)D的完全卷积模型可产生最佳的转移学习和微调结果。
translated by 谷歌翻译
在现场遥远的小物体和物体的检测是监视应用中的一个重大挑战。此类对象由图像中的少量像素表示,并且缺乏足够的细节,因此很难使用常规检测器检测到它们。在这项工作中,提出了一个称为切片辅助超推理(SAHI)的开源框架,该框架提供了一种通用切片的辅助推理和用于小对象检测的微调管道。提出的技术是通用的,因为它可以在任何可用的对象检测器之上应用于而无需任何微调。实验评估,使用对象检测基线在Visdrone和Xview Aerial对象检测数据集上表明,FCO,VFNET和TOOD检测器分别将对象检测方法分别增加6.8%,5.1%和5.3%。此外,通过切片辅助微调可以进一步提高检测准确性,从而导致累计增加12.7%,13.4%和14.5%的AP按照相同的顺序。拟议的技术已与DestectRon2,MMDetection和Yolov5模型集成在一起,并在https://github.com/obss/sahi.git上公开获得。
translated by 谷歌翻译
随着无人机的使用随着成本降低和改善的无人机技术而增加,无人机检测作为一个重要的对象检测任务。然而,在不利的条件下检测远处无人机,即弱对比度,远程,低可视性,需要有效的算法。我们的方法通过使用基于卡尔曼的对象跟踪器微调使用基于Kalman的对象跟踪器来提高yolov5模型来通过微调yolov5模型来接近无人机检测问题,以提高检测信心。我们的结果表明,通过最佳的合成数据子集增强真实数据可以提高性能。此外,由对象跟踪方法收集的时间信息可以进一步提高性能。
translated by 谷歌翻译
虽然考试风格的问题是一家提供各种目的的基本型教育工具,但有问题的手动构建是一个复杂的过程,需要培训,经验和资源。为减少与人工建设相关的开支并满足不需要持续供应新问题,可以使用自动问题(QG)技术。但是,与自动问题应答(QA)相比,QG是一个更具挑战性的任务。在这项工作中,我们在QA,QG的多任务设置中微调多语言T5(MT5)变压器,并使用土耳其QA DataSet回答提取任务。据我们所知,这是第一个尝试从土耳其语文本执行自动文本到文本问题的学术工作。评估结果表明,拟议的多任务设置达到了最先进的土耳其语问题应答和问题绩效,而不是TQuadv1,TQuadv2数据集和XQuad土耳其分裂。源代码和预先训练的模型可在https://github.com/obss/turkish-question-generation中获得。
translated by 谷歌翻译
作为数据为中心的AI竞争的一部分,我们提出了一种以迭代采样通过迭代采样来改善培训样本的多样性。该方法本身强烈依赖于增强样本的保真度和增强方法的多样性。此外,我们通过引入更多用于困难类别的样本来进一步提高性能,特别是为边缘案例提供更接近的样本可能会在手中错误分类的模型。
translated by 谷歌翻译
Object detectors are conventionally trained by a weighted sum of classification and localization losses. Recent studies (e.g., predicting IoU with an auxiliary head, Generalized Focal Loss, Rank & Sort Loss) have shown that forcing these two loss terms to interact with each other in non-conventional ways creates a useful inductive bias and improves performance. Inspired by these works, we focus on the correlation between classification and localization and make two main contributions: (i) We provide an analysis about the effects of correlation between classification and localization tasks in object detectors. We identify why correlation affects the performance of various NMS-based and NMS-free detectors, and we devise measures to evaluate the effect of correlation and use them to analyze common detectors. (ii) Motivated by our observations, e.g., that NMS-free detectors can also benefit from correlation, we propose Correlation Loss, a novel plug-in loss function that improves the performance of various object detectors by directly optimizing correlation coefficients: E.g., Correlation Loss on Sparse R-CNN, an NMS-free method, yields 1.6 AP gain on COCO and 1.8 AP gain on Cityscapes dataset. Our best model on Sparse R-CNN reaches 51.0 AP without test-time augmentation on COCO test-dev, reaching state-of-the-art. Code is available at https://github.com/fehmikahraman/CorrLoss
translated by 谷歌翻译
Person re-identification is a challenging task because of the high intra-class variance induced by the unrestricted nuisance factors of variations such as pose, illumination, viewpoint, background, and sensor noise. Recent approaches postulate that powerful architectures have the capacity to learn feature representations invariant to nuisance factors, by training them with losses that minimize intra-class variance and maximize inter-class separation, without modeling nuisance factors explicitly. The dominant approaches use either a discriminative loss with margin, like the softmax loss with the additive angular margin, or a metric learning loss, like the triplet loss with batch hard mining of triplets. Since the softmax imposes feature normalization, it limits the gradient flow supervising the feature embedding. We address this by joining the losses and leveraging the triplet loss as a proxy for the missing gradients. We further improve invariance to nuisance factors by adding the discriminative task of predicting attributes. Our extensive evaluation highlights that when only a holistic representation is learned, we consistently outperform the state-of-the-art on the three most challenging datasets. Such representations are easier to deploy in practical systems. Finally, we found that joining the losses removes the requirement for having a margin in the softmax loss while increasing performance.
translated by 谷歌翻译
The deployment flexibility and maneuverability of Unmanned Aerial Vehicles (UAVs) increased their adoption in various applications, such as wildfire tracking, border monitoring, etc. In many critical applications, UAVs capture images and other sensory data and then send the captured data to remote servers for inference and data processing tasks. However, this approach is not always practical in real-time applications due to the connection instability, limited bandwidth, and end-to-end latency. One promising solution is to divide the inference requests into multiple parts (layers or segments), with each part being executed in a different UAV based on the available resources. Furthermore, some applications require the UAVs to traverse certain areas and capture incidents; thus, planning their paths becomes critical particularly, to reduce the latency of making the collaborative inference process. Specifically, planning the UAVs trajectory can reduce the data transmission latency by communicating with devices in the same proximity while mitigating the transmission interference. This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm while respecting the resource constraints due to the computational load and memory usage of the inference requests. The model is formulated as an optimization problem and aims to minimize latency. The formulated problem is NP-hard so finding the optimal solution is quite complex; thus, this paper introduces a real-time and dynamic solution for online applications using deep reinforcement learning. We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.
translated by 谷歌翻译
Continuous long-term monitoring of motor health is crucial for the early detection of abnormalities such as bearing faults (up to 51% of motor failures are attributed to bearing faults). Despite numerous methodologies proposed for bearing fault detection, most of them require normal (healthy) and abnormal (faulty) data for training. Even with the recent deep learning (DL) methodologies trained on the labeled data from the same machine, the classification accuracy significantly deteriorates when one or few conditions are altered. Furthermore, their performance suffers significantly or may entirely fail when they are tested on another machine with entirely different healthy and faulty signal patterns. To address this need, in this pilot study, we propose a zero-shot bearing fault detection method that can detect any fault on a new (target) machine regardless of the working conditions, sensor parameters, or fault characteristics. To accomplish this objective, a 1D Operational Generative Adversarial Network (Op-GAN) first characterizes the transition between normal and fault vibration signals of (a) source machine(s) under various conditions, sensor parameters, and fault types. Then for a target machine, the potential faulty signals can be generated, and over its actual healthy and synthesized faulty signals, a compact, and lightweight 1D Self-ONN fault detector can then be trained to detect the real faulty condition in real time whenever it occurs. To validate the proposed approach, a new benchmark dataset is created using two different motors working under different conditions and sensor locations. Experimental results demonstrate that this novel approach can accurately detect any bearing fault achieving an average recall rate of around 89% and 95% on two target machines regardless of its type, severity, and location.
translated by 谷歌翻译
Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first fast and widely-applicable pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall's key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. TargetCall filters out all off-target reads before basecalling; and the highly-accurate but slow basecalling is performed only on the raw signals whose noisy reads are labeled as on-target. Our thorough experimental evaluations using both real and simulated data show that TargetCall 1) improves the end-to-end basecalling performance of the state-of-the-art basecaller by 3.31x while maintaining high (98.88%) sensitivity in keeping on-target reads, 2) maintains high accuracy in downstream analysis, 3) precisely filters out up to 94.71% of off-target reads, and 4) achieves better performance, sensitivity, and generality compared to prior works. We freely open-source TargetCall at https://github.com/CMU-SAFARI/TargetCall.
translated by 谷歌翻译